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ABSTRACT 

In astrophysical (inverse) regression problems it is an important task to decide whether 
a given parametric model describes the observational data sufficiently well or whether a 
non-parametric modelling becomes necessary. However, in contrast to common prac- 
tice this cannot be decided by solely comparing the quality of fit due to possible 
over-fitting by the non-parametric method. Therefore, in this paper we present a re- 
sampling algorithm which allows to decide whether deviations between a parametric 
and a non-parametric model are systematic or due to noise. The algorithm is based on 
a statistical comparison of the corresponding residuals, under the assumption of the 
parametric model as well as under violation of this assumption. This yields a graphical 
tool for a robust decision making of parametric versus non-parametric modelling. 

Moreover, our approach can be used for the selection of the most proper model 
among several competitors (model selection) . The methods are illustrated by the prob- 
lem of recovering the luminosity density in the Milky Way [MW] from near-infrared 
[NIR] surface brightness data of the DIRBE experiment on board of the COBE satel- 
lite. Among the parametric models investigated one with 4-armed spiral structure 
performs best. In this model the Sagittarius-Carina arm and its counter-arm are sig- 
nificantly weaker than the other pair of arms. Furthermore, we find statistical evidence 
for an improvement over a range of parametric models with different spiral structure 
morphologies by a non-parametric model of Bissantz & Gerhard (2002). 

Key words: methods: data analysis - methods: statistical - Galaxy: disc - Galaxy: 
structure. 



1 INTRODUCTION 

One of the basic problems in astrophysical research con- 
sists in the proper choice of a model to describe an obser- 
vational array ui h s (ti),i = 1, . . . , N of TV measurements. 
Here t = (it, . . . ,tjv) denotes a quantity which affects 
lu = (u;(£i), . . . , w(tjv)) in a systematic, but blurred way, 
ui{ti), the regression function to be reconstructed from the 
data. For example, w bs(ii) could be measurements of the 
surface brightness at sky position ti = (h,bi). In noisy in- 
verse models ui itself is not the quantity of primary interest, 
rather a function p has to be recovered, where the relation 
between lo and p is given by a (linear) operator K (matrix), 
viz. 

u(U) = (Kp)(U), i = l,...,N. 



Often K is given by a TV x TV matrix, which will in gen- 
eral be numerically difficult to invert. This will also be the 
case in this paper where we are concerned with the recov- 
ery of the spatial (three-dimensional) luminosity density of 
a galaxy from blurred observations of its surface brightness 
and reverberation mapping of gas in AGNs. For more ex- 
amples of inverse problems in astrophysics see e.g. Lucy 
(1994). Due to the noisy measurements it is tempting to 
assume that u) bs(ti) = oj(ti) + Si, where the e, denote some 
random noise and oj(ti) the expected value of L0 o h s (ti), i.e. 
E[u) b B (ti)] — co(ti). In particular we allow for different error 
distributions of the Ei, which entails inhomogeneous vari- 
ance patterns (cf. Hocking, 1996, for a thorough discussion 
of models with inhomogeneous variances), viz. V[sj] = of, 
as will be the case in our example of de-projecting the de- 
reddened COBE/DIRBE L-band surface brightness map of 
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Spergel et al. (1995). Cf. Bissantz & Munk, (2001) [BM1] 
for the noise properties of this data. 

In this paper we are concerned with a new method to com- 
pare several competing models for the regression function p 
and to select the most appropriate one. These models may 
be of a certain parametric form (parametric model) 
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or non-parametric, i.e. only qualitative smoothness or ge- 
ometric assumptions such as symmetry or differentiability, 
(cf. Wand & Jones (1995) or Wahba (1990) for a good in- 
troduction to non-parametric modelling by kernel or spline 
methods, respectively) are made a-priori. Implicitly, any al- 
gorithm, to reconstruct p from to, relies on those assump- 
tions or combinations thereof. Statistical methods for model 
selection are broadly used in astrophysics (cf. Feigelson & 
Babu, 2002), a good statistical introduction into this area 
is Burnham & Anderson (1998) or Eubank (1999) among 
many others. The case of inhomogeneous inverse models, as 
in our application, is not treated explicitely in the literature 
so far. 

Our approach is based on the statistical comparison of the 
estimated parametric residuals 



= oj obs (ti) - {Kp$) (U), 



= 1, 



, N, 



where p^ denotes the best possible fit of the model class U 
to the data w bs (e.g. obtained by least squares), and the 
estimated non-parametric residuals 

# p) = w b.(«i) - (ifp (np) ) (*<), i = l,...,N, 

where p' 1115 - 1 denotes a non-parametric reconstruction of p. 
This estimator can be obtained by several methods, includ- 
ing penalized maximum likelihood if the error distribution 
of e is known (e.g. Magorrian et al., (1998), Bissantz & 
Gerhard, 2002), the Richardson- Lucy iterative method (e.g. 
Binney, Gerhard & Spergel, 1997 [BGS]), subtractive opti- 
mally localized averages (e.g. Pijpers & Thompson, 1994, 
Pijpers & Wanders, 1994), and maximum entropy methods 
(e.g. Wallington et al, 1994, 1996). 

In our example, which will be discussed in detail in Sect. 
4 and 5, a penalized maximum likelihood method is used, 
where the penalty terms encourage symmetry, smoothness 
features and spiral structure of the recovered density distri- 
bution. In this model the error distribution is assumed to be 
independent of the data point. 

Now the general methodology will be to use the difference 
of the residuals i, and e,- np ^ 



-(np) 



(Kp 6 -Kp^)( ti ). 



Roughly speaking, a small sum of squares of Si will indicate 
that the non-parametric and the parametric fit are close, 
which should give evidence for the parametric model to hold. 
Otherwise, if Y] Sf is large the non-parametric fit outper- 
forms the parametric one. Due to the possibly inhomoge- 
neous variance pattern of the error of, a valid statistical 



analysis requires that Si has to be weighted with the esti- 
mated local variability of = o 2 (^), which results in a locally 
weighted residual sum of squares 

N /„ _(„p)N 2 



LWRSS 



N ^ 



Observe, that the use of the locally weighted residual 
sums of squares leads to a quantity which does not de- 
pend on the (unknown) local variability of the data, i.e. 



E 



-(np) 



will be a quantity which is independent 



of of for large N. In fact, we claim that under certain regu- 
larity conditions on K, the error distribution of e, a and 
p, iVV ft- jv LWRSS has a normal limit, where ft at is a se- 
quence tending to zero as N — > co. Hence the distribution of 
iVVftiv LWRSS for large numbers of observations tends to be 
asymptotically normal with a rather complicated variance 
which will depend on K, a and the smoothing method used 
for obtaining p. For the case of direct regression (then K is 
the identity) this was made explicite by Dette (1999) and 
Dette & Munk (2002), where we claim that the proof in the 
indirect case follows a similar pattern, and is postponed to 
a different paper. In order to base a proper decision whether 
p$ is acceptable or p^ np ' should be preferred, the approxi- 
mate normal distribution of Ny/hw LWRSS can now be used. 
The only problem which remains is to determine the vari- 
ance of N\/tiN LWRSS which will be done in the sequel by 
a resampling (bootstrap) algorithm from the data. Various 
simulation studies have shown that this method even leads 
to a better approximation of the true distribution of RSS (for 
finite N) than the asymptotic (N — > co) normal law. This 
is in accordance with work of Dette, von Lieres und Wilkau 
& Sperlich (2001), who investigated various variants of this 
algorithm in a different context and came to the same con- 
clusion. Our bootstrap algorithm will be presented in Sect. 
3. 

In Sect. 5 we apply the algorithm to the analysis of a dust- 
corrected near-infrared [NIR] COBE/DIRBE L-band map 
of the Milky Way [MW] (Spergel et al., 1995). Bissantz & 
Gerhard (2002) modelled this data non-parametrically with 
an implementation of the penalized maximum likelihood 
method. We find that their model improves significantly on 
various parametric models constructed from the parametric 
model of the same data [BGS], supplemented by different 
spiral structure models (cf. Sect. 4 for a description). By 
our method it can be concluded that this improvement in 
fit is not due to overfitting of the data, but rather to system- 
atic departures between data and parametric models, which 
are in particular not flexible enough to capture certain de- 
viations from a double-exponential disk and smooth spiral 
arms. 

We mention that our method can be used as well to de- 
cide between several classes of concurring parametric mod- 
els (with possibly different numbers of parameters). Among 
the parametric models with different spiral structure that we 
have investigated, a four-armed model with the Sagittarius- 
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Carina arm (and its counter-arm) significantly weaker than 
the other arms (cf. Drimmel & Spergel, 2001), outperforms 
its competitors. 

Finally, we mention that the main difference of our method 
to previous work of the authors ([BM1], Bissantz & Munk, 
2002 [BM2]) consists in three important aspects. First, 
LWRSS adapts to local variability estimated from data, 
which yields an overall measure of goodness of fit weighting 
the impact of data due to its local variability. Our numerical 
analysis shows that this yields much more reliable results as 
before. Second, we do not require an additional smoothing 
step for the residual differences as in [BM2]. This smooth- 
ing step had to be introduced in [BM1] in order to obtain 
a distributional limit of the statistic considered there. How- 
ever, due to this additional smoothing the statistics becomes 
much harder to interpret. In the present approach this is 
not necessary anymore, and our statistics LWRSS measures 
asymptotically the L 2 -distance between the true model p 
and the parametric model pa, i.e. 

M 2 — min \ \K(p — pv)\\ 2 

where || ■ || refers to the cuclidian norm. Furthermore, our 
new method also allows comparison with a non-parametric 
competitor, which serves as an objective alternative. 

For those readers, who are mainly interested in the method- 
ological part of this paper we recommend to skip Sect. 4. 



2 A NEW STATISTICAL METHOD TO 

DECIDE BETWEEN PARAMETRIC AND 
NON-PARAMETRIC MODELLING 

As mentioned in the introduction the underlying idea of our 
approach is to base the decision, whether the model U should 
be considered as acceptable, on the LWRSS, which has to be 
computed in the following steps. We illustrate our algorithm 
for the case where ti are located on a two-dimensional grid 
as it will be the case in our example in Sect. 4. 

(i) (Computation of the residual difference). Observe, 
that e, -e< np) =u> d -LO (np) . 

(ii) (Computation of a local estimate a 2 of the vari- 
ance a 2 ). Here, several methods are appropriate. A sim- 
ple method is based on Savitzky-Golay filters (Press et al., 
1994), which are common in astrophysics. The method re- 
quires the computation of the differences between a non- 
parametric model and the observed data at 7 neighbouring 
points on a grid: 

2 

k=-2 
2 

^ ci (o; (np) (^ + ;,6j + fc) — w bs{k+ubj+kj) } 

l=-2 

i = 3, . . . , n — 2 and j = 3, . . . , m — 2 



















j 

i 


i 












K 1 ) 



















Figure 1. Local residuals required for estimating the variance 
a 2 . 



where the u! b s (h,bj), j=i''"'m are the observations, the 
c fe and c; are weights chosen according to a Savitzky-Golay 
filter (Press et al., 1994, setting n L = n R = M = 2) and o/ np) 
is a non-parametric model of the Milky Way. 

Further approaches in the estimation of the local variabil- 
ity are possible with methods based on the computation of 
the local differences of, say 4, neighbouring points on a grid 
(cf. Fig. 1): 

l 

= g{ {u ohs (U, bj) - Uob s (li+k, bj) f 
i 

+ (^obsCij&j) - W bs(i«,&j+fc))}, 

fc=-l 

i = 2, ...,n — 1 and j = 2, . . . , m — 1. 

Then, in a second step the average over a window of neigh- 
bouring local residuals a 2 ^ is computed, viz. 

r,s 

~2 _ "2 

a = a ij> 

i,j=—r, — s 

where the window size can be chosen according to prior in- 
formation on constant regions of the expected local variabil- 
ity. 

Better but more computer intensive methods can be ob- 
tained by applying a kernel estimator to the residual squares 
if, see e.g. Ruppert et al. (1997), or biased reduced variance 
estimators (Thompson et al., 1991, Munk et al., 2001). 

(iii) (Compute LWRSS). Now, the distribution of LWRSS 
is required, which is extremely difficult to compute ex- 
plicitely. See, e.g. for the case of a direct regression model, 
Dette (1999). Nevertheless, in the following we describe a 
resampling algorithm which performs well. Numerical inves- 
tigations have shown (cf. Dette, 1999, and Dette et al., 2001) 
that a modification of the wild bootstrap approximates the 
true distribution of LWRSS very well and a bootstrap limit 
law has been proved in Dette & Neumeyer (2001) in a sim- 
ilar but simpler context. Even for rather small numbers of 
observations, iV = 100, say, these authors found in a broad 
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range of scenarios on the error distribution and p satisfac- 
tory results. This is in accordance with our findings. 



3 PERFORMING THE BOOTSTRAP 
ALGORITHM 

Our proposed algorithm is a modification of the algorithm 
presented in Bissantz & Munk (2001), and consists of the 
following steps. 

Step 0: Compute o; (np) . 

Step 1: (Generate random data). Generate N random data 
u>l2s by drawing with replacement from the observed data 

Mobs- 

Step 2: (Fitting of the random data). Determine the "para- 
metrically best-fitting model" p^q of the random data w^ s . 
Step 3: (Compute the target). Compute the LWRSS M ( 2 i} = 
||^(») — ^"^IIlwrss ( see below for the formal definition of 
this distance measure). 

Step 4: (The replication process). Repeat steps 1-3B 
times. This yields values M?i) , • • • , M?b) ■ B is & large num- 
ber, with B w 500 — 1000 usually sufficient. In contrast to 
[BM2] the non-parametric fit oj (np) is kept fixed, whereas the 
parametric fit is randomly perturbed in this algorithm. This 
leads to a significant reduction of computing time. 

Observe that Mm > • • • > ^fs) are realisations of a random 
quantity X — M» , and the cumulative distribution function 
Fg approximates the distribution function F of M 2 . 

This algorithm is in accordance with the algorithm of Dette 
ct al. (2001) who suggested this method in a different con- 
text. Interestingly, these authors came to the same conclu- 
sion, that bootstrapping from the parametric model gives 
much more reliable results than bootstrapping from the non- 
parametric residuals e' np ' ) . This is an empirical finding and 
somehow in contrast to theoretical results. 



4 NEAR-INFRARED MODELS OF THE 
MILKY WAY 

We now apply our proposed method to models of 
COBE/DIRBE NIR data in the following. Such models of 
the distribution of near-infrared luminosity in the inner MW 
are particularly interesting because NIR light traces lumi- 
nous mass well (e.g. Rix & Zaritzky, 1995). COBE/DIRBE 
NIR maps were used to estimate the NIR luminosity distri- 
bution both parametrically (e.g. Freudenreich, 1998, Dwek 
ct al., 1995, and Drimmel & Spergel, 2001), and non- 
parametrically ([BGS], Bissantz, Englmaier, Binney & Ger- 
hard, 1997, and Bissantz & Gerhard, 2002). Only the models 
of Drimmel & Spergel (2001) and Bissantz & Gerhard (2002) 
contain spiral arms. Bissantz & Gerhard (2002) show that 
inclusion of spiral structure is important for the bar/bulge 



of the model, as only then the elongation of the bulge/bar in 
their models is large enough to reproduce clump giant star 
count data of Stanek et al. (1994, 1997). Such star count 
data contain information about the distances to the sur- 
veyed stars, complementary to the all-sky coverage of the 
COBE/DIRBE NIR map; thus the star count data provides 
an important a posteriori test of the model(s). 

However, despite of its importance, the morphology of the 
stellar spiral arms of the Milky Way (i.e. in the distribu- 
tion of luminous mass) is not well known. In particular it is 
unclear whether it is predominantly 2- or 4-armed. Ortiz & 
Lepine (1993) used a 4-armed model with logarithmic spiral 
structure in their model of MW NIR starcounts. However, 
there is a tangent point at « 49 deg in the (very probably 4- 
armed) distribution of gas and dust (Englmaier & Gerhard, 
1999), which seems to be missing in the COBE/DIRBE ri- 
band map of the MW (Drimmel, 2000). This indicates 2- 
armed rather than 4-armed structure. On the other hand 
this tangent is also weak in CO, possibly due to the ge- 
ometry of the line-of-sight through this arm (Dame, private 
communication). In Drimmel & Spergel (2001) the analy- 
sis of the COBE/DIRBE 240pm (tracing dust) and NIR 
(tracing stellar light) data was extended, and a combined 
dust and stellar disk model produced. The authors found 
a best model for the stellar disk which is 4-armed. How- 
ever, the Sagittarius-Carina arm in their model is weaker 
by a factor of 0.4 in arm-interarm density contrast than 
the other arms. Bissantz & Gerhard (2002) found in their 
non-parametric reference model with spiral arms that inclu- 
sion of spiral structure significantly improves their model, 
however they were not able to decide whether 2 or 4-armed 
structure is preferable. 

Here we aim to better understand stellar spiral struc- 
ture in the MW by further analysis of a dust-corrected 
COBE/DIRBE L-band map (Spergel et al. 1995), shown 
in Fig. 2. For our analysis we compare four classes of para- 
metric models based on this data, which differ with regard 
to the spiral structure. We include models with 2-arms, with 
4-arms, without arms and an intermediate model consisting 
of a strong and a weak pair of arms (the latter model as 
suggested by Drimmel & Spergel, 2001). 

We apply our proposed statistical method to perform the 
comparison of the parametric models by using the non- 
parametric reference model of Bissantz & Gerhard (2002) as 
an objective standard of measurement. To this end the P- 
value curves of [BM2] are modified for the LWRSS-statistic. 
Additionally, with our methodology we find that the non- 
parametric model better reproduces the informative part of 
the L-band map than the best parametric model, particu- 
larly in the disk near spiral arm tangent point features (cf. 
Fig. 5). This is achieved by using the distribution of LWRSS 
which is found by the bootstrap algorithm in Sect. 3. In fact 
this shows that there is rare evidence that the difference be- 
tween the parametric and non-parametric fit is solely due 
to noise, rather than due to a systematic departure of the 



© 0000 RAS, MNRAS 000, 000-000 



cn 

CD 



0) 
3 



u 

D 
O 



Non-parametric versus parametric modelling 5 
COBE/DIRBE L-band data 




Galactic longitude [deg] 

Figure 2. The dust-corrected COBE/DIRBE L-band map of Spergel et al. (1995). Contour spacing is 0.5mag 2 and the bold contour is 
at Omag 2 . Note that there is an arbitrary offset to this scale, however it is the same as used in Fig. 5 to allow for a comparison of models 
and data. 



true p and the best parametric model as described in the 
last paragraph. 

First we introduce the parametric models in Sect. 4.1 and 
the non-parametric reference model of Bissantz & Gerhard 
(2002) in Sect. 4.2. 



4.1 The parametric models 

The parametric models are defined on a Galacto-centric 
Cartesian coordinate system with axes x,y,z, where x is 
along the major axis, and y along the minor axis of the 
bulge/bar, both in the main plane of the MW. The position 
of the Sun in this coordinate system is zq — 14 pc above 
the main plane of the disk and Rq = 8 kpc from the Galac- 
tic centre, and the angle between the major axis of the bar 
and the line-of-sight from the Sun to the Galactic Centre is 
<i) bar = 20deg [BGS]. 

The parametric models analysed in this paper are simi- 
lar, except for their spiral structure. The other model con- 
stituents are a double-exponential disk and a truncated 
power- law bulge (cf. [BGS]). Calling the disk density pd and 
the bulge density p b we define the model density p as: 



p(x) = p d (x)+p b (x), 
where 

p d = p° d ■ R d ■ e- R ' R * ■ 



(1) 



-M/zo 



-1*1/* 



2fj 



+ a- 



21 



Pb 



pi 



2 i 2 
-a /a n 



V(a™ (1 + a/ao) 1 - 



and x — (x, y, z) 



Bissantz & Gerhard (2002) added 4-armed spiral structure 
to this density p, according to the model of Ortiz & Lepine 
(1993), and then used the model as initial model in their 
non-parametric de-projection. The positions of the spiral 
arms ri(4>) (i = 1, ... ,4) in the model of Ortiz & Lepine 
(1993) are given by 

n{4>) = 2.33kpc-e w - ¥ ' b "-^ ) ' tan(x) , 

where the angle c/>» = 0, 7r/2, it, Ztx/2 determines the inner- 
most position angle of a spiral arm in Galacto-centric co- 
ordinates with respect to the major axis of the bar, and 
X = 13.8 deg is the pitch angle of the arms. 

In our parametric models the spirals exist between Galac- 
tocentric radius of 3.5 kpc, which is the approximate outer 
extend of the bar/bulge (Bissantz & Gerhard, 2002), and an 
outer radius of 10 kpc. The spiral arms are modelled by a 
Gaussian profile with full width at half maximum ~ 300 pc 
(Ortiz & Lepine, 1993). We treat the spiral arms as enhance- 
ments of the disc density. The only free parameter that we 
fit for the spiral structure is the amplitude d s of the density 
modulations: 



including spiral 
Pd 



=«-n( 



1 +d s 



- ln(2)-Ar|/(0.5-FWHM) 2 



where An is the (approximate) distance to the nearest point 
along spiral arm i. To keep the problem computationally 
tractable we use this rather simple model of the spiral struc- 
ture in the following. Note that Vallee (2002) suggests some 
improvements to the model of Ortiz & Lepine, however there 
is good agreement between these models in the radial range 
3 kpc < r < 6 kpc from the Galactic Center. This is where 
most of the evidence results for spiral structure in the subse- 
quently analysed region of the sky (cf. Sect. 5.1). Also visual 
inspection of the residuals of all parametric models showed 
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no systematic deviations between the position of model spi- 
ral arm tangent points from those in the data. 

We call this 4-armed model "model A" in the subsequent 
analysis. The other parametric models under investigation 
are modifications of model "A". Model "B" has no spiral 
arms at all, and in model "C" we omit the Sag-Car-arm and 
its counter-arm, as suggested by Drimmel (2000). Finally, 
in our "intermediate" model "D" the amplitude d s of the 
Sag-Car arm and its counter-arm are a factor of 0.4 smaller 
than the amplitude of the other pair of arms, as suggested 
by Drimmel & Spergel (2001). 

4.2 The non-parametric model 

As a non-parametric competitor to the above mentioned 
parametric models we use the reference non-parametric lu- 
minosity density model of Bissantz & Gerhard (2002) . It was 
estimated from the dust-corrected COBE/DIRBE L-band 
map of Spergel et al. (1995), using a penalised maximum 
likelihood algorithm with penalty terms that encourage in 
the model density eightfold-symmetry with respect to the 
three main planes of (approximate) symmetry of the bar, 
smoothness, and spiral structure (similar to the 4-armed 
spiral structure of the Ortiz & Lepine, 1993, model). The 
non-parametric model is defined on a (Galactocentric) grid 
of 60x60x41 grid points which extends lOkpc along the x 
and y axis, and 3 kpc along the z-axis. Outside of this box 
the model is continued by a parametric model of the L-band 
map. 

Bissantz & Gerhard (2002) derived non-parametric models 
for bar angles 10 deg < (p hllI < 44 deg. They found a pre- 
ferred range of bar angles 20 deg < ^ b ar < 25 deg. Their 
best model is for bar angle (^bar = 20 deg, and is the "refer- 
ence" model analysed subsequently. 



5 APPLICATION TO MODELS OF THE 
MILKY WAY 

In this section we describe the application of our statistical 
method to MW models A-D. To this end we estimate M 2 , 
which is the distance between the "true" MW density p, and 
the "best-fitting" parametric model p$* , of each of the model 
(classes) A-D. Note that we use the reference non-parametric 
model as an objective standard of measurement. We approx- 
imately determine the statistic of M 2 by bootstrap replica- 
tion of the distance M 2 ^ (in our case i — l,...,w 1000) 
between "best fitting" models pi 1 ' of random data and 
the non-parametric reference model cj' np ' (cf. Sect. 3). The 
random data ar^ is generated from the COBE/DIRBE data 
<^obs by drawing with replacement. We describe in this sec- 
tion the generation of the random sets of data (Sect. 5.1), 
the computation of M?^ (Sect. 5.2), and finally, in Sect. 5.3, 
we explain how to evaluate the resulting statistic of M 2 ^ for 
the parametric models. 



5.1 Random data sets based on the COBE/DIRBE 
data 

We construct random sets of data ur bs from a subset of the 
Spergel et al. (1995) COBE/DIRBE data which is generated 
in a first step as follows: We linearly interpolate the surface 
brightness data (in magnitudes) on a grid Q with N = 4800 
equidistant points, covering \l\ < 40 deg, |6| < 10 deg from 
the observed data. We restrict the data to this area because 
there the parametric continuation of the non-parametric 
model contributes only unimportantly to the projection of 
the model to the sky. 

In the second step we estimate the local noise properties 
of the COBE/DIRBE data on the grid Q. To this end we 
compute a map of the square difference between the pro- 
jection of the non-parametric reference model to the sky 
and the COBE/DIRBE data (both in magnitudes) at those 
positions of the sky where this data is available. Then we 
smooth this map, employing a Savitzky-Golay filter (Press 
et al., 1994, setting ur = til = M = 2 for their parameters). 
This procedure yields a non-parametric estimate of the lo- 
cal variance a 2 (l,b) (cf. Sect. 2). Finally, we determine the 
local variance at the points of grid Q by linear interpolation 
in this map. 

In our final step we construct random data oj^2 s as follows. 
We randomly draw with replacement 4800 points (lj,bj) out 
of the grid Q. The surface brightness observations u) b s {lj, bj) 
together with the estimates of the local variance a 2 (lj,bj), at 
the drawn points {U, bj), then constitute a random sample of 
data. Observe, that drawing with replacement implies that 
a point (lj,bj) may occur more than once in this random 
sample. 

5.2 The distribution of M ( 2 i} 

We approximate the statistic of the distance between the 
best-fitting parametric model p^ for every parametric model 
A-D and the reference non-parametric model as follows. We 
compute B — 1000 bootstrap replications of M 2 ^ , using the 
following procedure for 1 < i < 1000: 

(i) Generation of a random set of data u/ b ' s fr° m the 
COBE/DIRBE L-band data (cf. Sect. 5.1). 

(ii) Determination of the weighted least square estima- 
tor (WLSE) (the best-fitting model) for this random 
data set. For this we use a Marquardt-Levenberg-algorithm 
(Press et al., (1994)), which was used to minimize the dis- 
tance 

(w^((i,6)i)-w*((i,6)i)) 2 
L> * 2 ((l,bh) ■ 

all points (I ,b) j of w . 

For the computation of a 2 see (ii) in Sect. 2. The fitting is 
done in a two-step process: 
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M 

Figure 3. Cumulative distribution functions of M 2 ^ for models 
A (full line), B (dashed), C (dot-dash-dot-dash) and D (dotted). 
The vertical line indicates the distance M 2 of the best-fitting 
parametric model for the original COBE/DIRBE data from 
the non-parametric model. 



1. Fitting of the disk parameters: In the first step we fit 
the disk parameters and the bulge normalisation b, with the 
other bulge parameters fixed. 

2. Fitting of the bulge/bar parameters: In the second step 
we fix the disk related parameters found in the first step 
(except for the normalisation parameter d) and fit the 
bulge/bar parameters and d. 

(iii) Computation of the distance M.%, between the refer- 
ence non-parametric model and the best-fitting parametric 
model S (l) : 



M, 



(i) 



IK(«)-" (nP) || 2 
1 



E 



all points 
((!,!>).,■) of g 



C^(Q((*.*>)j)-" (nP) ((*.ft)j) 



where n • m = 4800. 



1000 bootstrap replications M?^ we then 



From the B 

determine the empirical cumulative probability distribution 
function of the random quantity M 2 , shown in Fig. 3, for the 
parametric models A-D. Also indicated in Fig. 3 are the esti- 
mated distances M 2 between the non-parametric model and 
the best-fitting parametric models p^ of the COBE/DIRBE 
surface brightness, restricted to the grid Q, which is com- 
puted from the observed COBE/DIRBE data. 



5.3 P-value curves: Interpretation of the 
numerical results 

We now proceed by using the bootstrap replications for the 
parametric models A-D to answer two fundamental ques- 
tions: 

(i) Should the non-parametric model be preferred over 
the parametric models, i.e. does it improve the fit to the 
informative part (signal) of the data? 
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Figure 4. P-value curves ajv 

(n) = Fg (m 2 - n) for models A 

(full line), B (dashed), C (dot-dash-dot-dash), and D (dotted). 



(ii) How do the parametric models compare to each other, 
can we decide for a "best" among them, and - if yes - which 
one is it? 

We begin with some general remarks on our proposed 
method before we use it to answer these questions. The main 
methodology we propose is the use of P-value curves for the 
distance M 2 as graphical tools for illustrating the evidence 
for or against a model. To this end we plot the function 
Qjv(n) = F B {fN(M 2 - n)) for n > 0, i.e. the value of 
ajv(rt) is given by the probability that the random quantity 
X = y/N (M 2 - M 2 ) is smaller than (M 2 - U). Note 
that this implies that for II increasing ajv(n) decreases, be- 
cause we then evaluate the cumulative distribution function 
Fg(x) for decreasing x, and in particular, if ajv(n) is small, 
at the left tail of Fg. We present the P-value curves for the 
parametric models A-D in Fig. 4. 

The interpretation of the function qjv(II) is as follows. As- 
sume the true distance between a parametric model and the 
"true" density Kp is M 2 = II. Now we reject the hypotheses 
H : M 2 > II (vs. alternative M 2 < II) whenever ajv(n) < a 
for a given level of significance a. Hence 1 — Qjv(n) can 
be regarded as the estimated evidence in favour of the para- 
metric model U (up to a distance between parametric model 
and true density M 2 < IV). Finally the astrophysicist has to 
decide whether a value of M 2 = n should be regarded as sci- 
entifically negligible or as deviation from the "true" density 
Kp which is considered as significant by astrophysical rea- 
sons. For a more thorough introduction into P-value curves 
in the astrophysical context see [BM2], and for the statisti- 
cal theory see Munk (2002) Sect. 5 and the references given 
there. 

With this interpretation in mind we now determine "dis- 
tance margins" from the P-value curves in Fig. 4. These 
margins indicate the most likely distance between the true 
density and the individual parametric models. We use those 
distances n where the error probability for the hypothe- 
sis "the distance between parametric model and true den- 
sity Kp is larger than n" is w 95% and w 5% to define 
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the upper and lower bound of the distance margins (con- 
fidence interval), and find 6.5 < M 2 < 20.6 for model A, 
7.7 <M 2 < 22.5 for model B, 5.1 < M 2 < 20.9 for model C, 
and and 6.3 <M 2 < 16.7 for model D. 

Observe that our distance margins are much more meaning- 
ful than, say, computing the "classical" x 2 -distance between 
parametric model and data. This is in particular because 
the distance margins not only give the absolute distance be- 
tween the parametric model and the non-parametric one, 
but also error bounds for the distance. Furthermore, our 
method allows and adapts for the commonly poorly known 
and heteroscedastic distribution of observational noise - in 
contrast to classical x 2 — methods. 

It is important to comment on the validity of our approach 
of using the non-parametric model as an "objective stan- 
dard of measurement" in the context of the quality of the 
non-parametric model. To this end we consider two extreme 
cases. First, assume that the non-parametric model would 
fit the data rather bad relative to the estimated local stan- 
dard deviation a(l,b). In consequence the cumulative distri- 
bution function of M 2 ^ tends sharply to 1. This implies a 
P- value curve which goes to zero very quickly, and we have 
to conclude that the distance between parametric model and 
"true" density Kp is very small, and hence there is no rea- 
son to prefer non-parametric over parametric modelling of 
the data. On the other hand, now assume that the non- 
parametric model is a (nearly) perfect fit to the data rel- 
ative to the local standard deviation. Then the cumulative 
distribution function of M 2 ^, i.e. of the estimated distance 
between parametric model and "true" density, tends very 
slowly to 1 , implying a P- value curve which decreases to zero 
rather slowly. In this case we cannot make any statement 
about a small distance of the parametric model from the 
"true" density. In conclusion our new method even remains 
valid for "bad" non-parametric models serving as "objective 
standards of measurement" . 

We now use the distance margins to answer our first ques- 
tion - whether the non-parametric model should be pref- 
ered over the parametric models. To this end we have to 
decide whether we consider the determined distance mar- 
gin between a parametric model and the true density (here 
represented by the non-parametric model which is used as 
an objective standard of measurement) as astrophysically 
significant. For this we compare the distance margins with 
crude estimates for the distance between the non-parametric 
model and the COBE/DIRBE data on grid Q, and for the er- 
ror dispersion in the COBE/DIRBE L-band data, as given 
by Spergel et al. (1995). If these latter values are smaller 
than the typical distance margins we conclude that the lat- 
ter distances are astrophysically significant and the non- 
parametric model should be prefered. 

We begin with the computation of the distance between 
the non-parametric model and the COBE/DIRBE data on 
the grid Q, normalized by the estimated variance of the 
COBE/DIRBE L-band data, which is « 0.076 m (Spergel 



et al., 1995, cf. Bissantz & Gerhard, 2002). This normalisa- 
tion enables us to compare the above determined distances 
between the non-parametric model and the parametric mod- 
els, observing that the quantity M 2 is similar in construc- 
tion to a rms difference between the models (on the sky), 
"weighted" by the local variance of the data a 2 (I, b). We find 
0.62 for our crude estimate of the distance between the refer- 
ence non-parametric model and the observed data. This, and 
in particular the normalized dispersion of the data (which 
by our definition amounts to 1), is significantly less than the 
distance margins indicate for the parametric models. Thus 
we conclude that the non-parametric model improves signif- 
icantly over the parametric models and should be prefered. 

This conclusion is supported by Fig. 5 where the squared 
(rms) residuals between the reference non-parametric model 
and the COBE/DIRBE L-band map, and the same for the 
best parametric models p^ (the best of which is of type D, cf. 
Fig. 3), are compared. Obviously the non-parametric model 
fits the data very well, in particular in the central region 
|2| < 50 deg, |6| < 20deg. The residuals of the parametric 
model, however, show systematic deviations from the data. 
Note, however, that the P-value curves provide us with sub- 
stantially more information than these maps of model resid- 
uals, because they guard us against overfitting in a quanti- 
tative way. 

Having decided that the distance margins are of astrophysi- 
cal significance we proceed to the second question, which is 
to compare the parametric models among each others. This 
will be done for illustrational purpose mainly, because we 
have already seen that a non-parametric model should be 
prefered. To this end we use the right tail of the P-value 
curves, where ajv(n r ) ~0.05, i.e. we determine the distance 
n r for which the hypothesis "the distance between the para- 
metric model and the true density Kp is larger than II r " has 
an error probability ~95%. Inspection of Fig. 4 shows that 
the intermediate model D outperforms the other parametric 
models because II r is smallest for this model, indicating that 
it is the best among models A-D. The 2- and 4-armed models 
A and C are approximately similar in quality, and the model 
without spiral arms (B) is clearly the worst. We remark that 
the differences between the distances II r for the models (as 
given by the upper bound of the distance margins, cf. above) 
are significantly larger than the normalized distance between 
the non-parametric model and the data, and in particular 
than the normalized dispersion in the data. We conclude 
that the differences between the models are significant, and 
that the intermediate model D is best, i.e. we can exclude 
a large distance of the parametric model from the true MW 
density with a higher level of confidence for model D than 
for models A-C. 

Finally, we comment on the P-value curve analysis in the 
context of "classical" statistical tests, which start off from 
the hypothesis II = 0. Fig. 6 presents the left tail of our 
P-value curves, where II w 0, showing that some of the P- 
value curves intersect. This has an interesting consequence: a 
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Figure 5. Square (rms) difference between the best-fitting parametric models and the COBE/DIRBE L-band data, and the same 
difference for the reference non-parametric model. Contour spacing is 0.05mag and the bold contour is at O.lmag for for the rms 
plots (left hand side) and 0.5mag, Omag 2 for the model plots (right hand side). Contour lines in the rms plots have been smoothed by 
averaging over a point and its four nearest neighbours. Different areas of the sky are shown for the rms plots (the region of sky to which 
the parametric models were fitted) and the model plots (full sky area covered by our COBE/DIRBE map). Note the different appearance 
of spiral arm tangent points for the different models, and the corresponding residuals in the rms maps. 
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n 

Figure 6. P-value curves ajv (n) = F* (m 2 - n) for models A 
(full line), B (dashed), C (dot-dash-dot-dash) , and D (dotted), 
respectively, where El ss 0. 



"classical" test based on the hypothesis that the model holds 
(LT = 0) leads to a different answer (here: model C performs 
best) than our proposed method, which is based on the more 
realistic hypothesis, that there is a nonzero distance LT > 
between the parametric model and the true MW density 
Kp. Observe that, because we have found in our precedent 
analysis that the observed distances between the parametric 
models and the true MW density are significant, a classical 
test would indeed be inappropriate in our application. 

To close this section we summarize the answer to our ques- 
tions raised at the beginning of this section: 

(i) The non-parametric model should be prefered over the 
parametric models because the improvement in fit is astro- 
physically relevant and with high confidence due to system- 
atic features. 

(ii) A parametric model with 4-armed spiral structure, 
however, with reduced amplitude of the Sagittarius-Carina 
arm and its counter-arm as suggested by Drimmel & Spergel 
(2001), significantly outperforms the other parametric mod- 
els. The worst performance is shown by a model without 
spiral arms at all as to be expected. 



6 DISCUSSION 

We have suggested a resampling algorithm to assess the ne- 
cessity of non-parametric instead of parametric modelling 
in astrophysical (inverse) regression problems. By means of 
this we are in the position to decide whether the deviations 
between the parametric model and the data are systematic 
or due to noise. Furthermore our method can be used to 
select the best among several competing parametric mod- 
els. Our approach is based on the idea to investigate the 
statistical behaviour of the estimated distance between the 
true model p and the artifical model U under all "possible 
worlds" M 2 — LT, and not only when M 2 = (i.e. p and U 
coincide), as classical goodness of fit tests do. Moreover, we 



compare all parametric models by relating them to a non- 
parametric "super-model" which can be validated itself by 
our method. 

To illustrate our method we have applied it to the problem 
of recovering the near-infrared luminosity density distribu- 
tion of the Milky Way from a dust-corrected COBE/DIRBE 
L-band map (Spergel et al., 1995). In this paper we have fo- 
cused on the morphology of the spiral arms, comparing para- 
metric models which have zero, 2 or 4 spiral arms, and also 
an "intermediate" 4-armed model, in which the Sagittarius- 
Carina arm and its counter-arm are considerably less strong 
than the other pair of arms. These parametric models have 
been compared with a non-parametric model of Bissantz & 
Gerhard (2002). From our statistical analysis we conclude 
that the non-parametric model is significantly better than 
the parametric models and hence should be prefered. This is 
due to systematic departures between data and parametric 
models, which are in particular to inflexible to reproduce cer- 
tain deviations from a double-exponential disk and smooth 
spiral arms. Furthermore, we have found that the "interme- 
diate" parametric model outperforms the other parametric 
models by a significant amount. Thus, from the analysis of 
the dust-corrected COBE/DIRBE L-band map, a paramet- 
ric model of the Milky Way with 4-arms, but the Sagittarius- 
Carina arm (and in our model also its counter-arm) of re- 
duced amplitude - similar to the suggestion of Drimmel & 
Spergel (2001) - is to be prefered over models with 2-armed 
or 4-armed structure, and in particular over a model without 
spiral structure, which performed worst in our analysis. 

Before making our final conclusions we want to point out 
some difficulties of the COBE/DIRBE NIR data with re- 
spect to spiral arm analysis as pointed out by a referee. 
Firstly, dust extinction, which is biased towards the spi- 
ral arms, makes those less evident in NIR data (cf. Drim- 
mel & Spergel, 2001). Our analysed L-band map was dust- 
corrected by Spergel et al. (1995), but this extinction-bias 
obviously also results in their dust-correction being more 
difficult to perform. Secondly, the large angular size of the 
nearby Sagittarius-Carina arm makes it more difficult to ob- 
serve in NIR maps, because combined with the small instru- 
ment beam a larger amount of emission will be lost below 
the sensitivity threshold of the instrument than for the other 
arms. Finally, since we had to restrict our parametric fitting 
to a range of longitudes \l\ < 40deg of the data due to com- 
putational reasons (range of the non-parametric model), we 
did not include the tangent point regions of the Sagittarius- 
Carina arm, which substantially weakens any conclusion re- 
garding a differing amplitude of this arm. The latter two 
points obviously weaken the statistical evidence for the in- 
termediate model compared to the 4-armed model by some 
amount. 

This result is consistent with Drimmel & Spergel (2001), 
and also with the SPH models of the inner Milky Way gas 
dynamics of Bissantz & Gerhard (2002), who found that 
4-armed spiral structure is needed in the gravitational po- 
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tential caused by the stellar distribution. They analysed the 
gas flow in a potential ("mix"), which is quite similar to our 
intermediate model. This gas model was found to be slightly 
worse than gas flow models in their standard 4-armed po- 
tential, but not nearly as bad as the gas flow in 2-armed 
potentials, and is clearly still acceptable. 
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